BioOptimizer: a Bayesian scoring function approach to motif discovery
نویسندگان
چکیده
MOTIVATION Transcription factors (TFs) bind directly to short segments on the genome, often within hundreds to thousands of base pairs upstream of gene transcription start sites, to regulate gene expression. The experimental determination of TFs binding sites is expensive and time-consuming. Many motif-finding programs have been developed, but no program is clearly superior in all situations. Practitioners often find it difficult to judge which of the motifs predicted by these algorithms are more likely to be biologically relevant. RESULTS We derive a comprehensive scoring function based on a full Bayesian model that can handle unknown site abundance, unknown motif width and two-block motifs with variable-length gaps. An algorithm called BioOptimizer is proposed to optimize this scoring function so as to reduce noise in the motif signal found by any motif-finding program. The accuracy of BioOptimizer, which can be used in conjunction with several existing programs, is shown to be superior to using any of these motif-finding programs alone when evaluated by both simulation studies and application to sets of co-regulated genes in bacteria. In addition, this scoring function formulation enables us to compare objectively different predicted motifs and select the optimal ones, effectively combining the strengths of existing programs. AVAILABILITY BioOptimizer is available for download at www.fas.harvard.edu/~junliu/BioOptimizer/
منابع مشابه
BioOptimizer: Improving Models for Discovery of Transcription Factor Binding Motifs
The experimental determination of TF binding sites is expensive and timeconsuming. Many motif-finding programs have been developed but no program is clearly superior in all situations. Practitioners often find it difficult to judge which of the motifs predicted by these algorithms are more likely to be biologically relevant. We derive a comprehensive scoring function based on a full Bayesian mo...
متن کاملGAME: detecting cis-regulatory elements using a genetic algorithm
MOTIVATION Identification of a transcription factor binding sites is an important aspect of the analysis of genetic regulation. Many programs have been developed for the de novo discovery of a binding motif (collection of binding sites). Recently, a scoring function formulation was derived that allows for the comparison of discovered motifs from different programs [S.T. Jensen, X.S. Liu, Q. Zho...
متن کاملBayesian multiple-instance motif discovery with BAMBI: inference of recombinase and transcription factor binding sites
Finding conserved motifs in genomic sequences represents one of essential bioinformatic problems. However, achieving high discovery performance without imposing substantial auxiliary constraints on possible motif features remains a key algorithmic challenge. This work describes BAMBI-a sequential Monte Carlo motif-identification algorithm, which is based on a position weight matrix model that d...
متن کاملComputational Discovery of Gene Regulatory Binding Motifs: A Bayesian Perspective
The Bayesian approach together with Markov chain Monte Carlo techniques has provided an attractive solution to many important bioinformatics problems such as multiple sequence alignment, microarray analysis and the discovery of gene regulatory binding motifs. The employment of such methods and, more broadly, explicit statistical modeling, has revolutionized the field of computational biology. A...
متن کاملWebMOTIFS: automated discovery, filtering and scoring of DNA sequence motifs using multiple programs and Bayesian approaches
WebMOTIFS provides a web interface that facilitates the discovery and analysis of DNA-sequence motifs. Several studies have shown that the accuracy of motif discovery can be significantly improved by using multiple de novo motif discovery programs and using randomized control calculations to identify the most significant motifs or by using Bayesian approaches. WebMOTIFS makes it easy to apply t...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Bioinformatics
دوره 20 10 شماره
صفحات -
تاریخ انتشار 2004